{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A4Eu9kibaOTm"
      },
      "source": [
        "# Chat with Code\n",
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Y7fr7_CCprlW8yb7BTQttUGAxEfLAAd0)\n",
        "\n",
        "This notebook demonstrates how to build a **Retrieval-Augmented Generation (RAG)** system for interacting with GitHub repositories using Nebius AI and LlamaIndex.\n",
        "\n",
        "Nebius AI provides access to many state-of-the-art LLM models. Check out the full list of models [here](https://studio.nebius.ai/).\n",
        "\n",
        "Visit https://studio.nebius.ai/ and sign up to get an API key.\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0vjJV1GgaUdL"
      },
      "source": [
        "\n",
        "\n",
        "## Install Dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4r5zC5hOVB6E"
      },
      "outputs": [],
      "source": [
        "%pip install llama-index-llms-nebius llama-index-embeddings-nebius llama-index-readers-github"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "SWNHOQA2VC9h"
      },
      "outputs": [],
      "source": [
        "!pip install -U llama-index"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RsqHTHwZafJd"
      },
      "source": [
        "## Setting Up Environment Variables\n",
        "\n",
        "For security reasons, use environment variables to store API keys instead of hardcoding them."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sAio2TqqVQ3u"
      },
      "outputs": [],
      "source": [
        "# set api key in env or in llm\n",
        "import os\n",
        "os.environ[\"GITHUB_TOKEN\"]=\"Your Github Token\"\n",
        "os.environ[\"NEBIUS_API_KEY\"] = \"Your Nebius API Key\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "haQFk4B_fRfB"
      },
      "outputs": [],
      "source": [
        "import nest_asyncio\n",
        "\n",
        "nest_asyncio.apply()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4ezMsNr7alqU"
      },
      "source": [
        "## Import Required Modules\n",
        "\n",
        "We import essential modules from `llama-index` to work with Nebius AI models and GitHub repositories."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zWyXnVOHVz5k"
      },
      "outputs": [],
      "source": [
        "from llama_index.core import SimpleDirectoryReader,Settings, VectorStoreIndex, PromptTemplate\n",
        "from llama_index.embeddings.nebius import NebiusEmbedding\n",
        "from llama_index.llms.nebius import NebiusLLM\n",
        "from llama_index.readers.github import GithubRepositoryReader, GithubClient"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "B1-tCLYBarR4"
      },
      "source": [
        "## Initializing GitHub Client and Loading Repository Data\n",
        "\n",
        "Replace `owner` and `repo` with your target repository details."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "cSfYXD_VS7l2",
        "outputId": "235d4322-811c-418e-e879-612252c7dc35"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Loaded 13 documents from GitHub.\n"
          ]
        }
      ],
      "source": [
        "github_token = os.environ.get(\"GITHUB_TOKEN\")\n",
        "def initialize_github_client(github_token):\n",
        "    return GithubClient(github_token)\n",
        "\n",
        "github_client = initialize_github_client(github_token)\n",
        "\n",
        "owner = \"Arindam200\"\n",
        "repo = \"logo-ai\"\n",
        "\n",
        "loader = GithubRepositoryReader(\n",
        "    github_client,\n",
        "            owner=owner,\n",
        "            repo=repo,\n",
        "            filter_file_extensions=(\n",
        "                [\".py\", \".ipynb\", \".js\", \".ts\", \".md\"],\n",
        "                GithubRepositoryReader.FilterType.INCLUDE,\n",
        "            ),\n",
        "            verbose=False,\n",
        "            concurrent_requests=5,\n",
        ")\n",
        "\n",
        "branch = \"main\"\n",
        "\n",
        "docs = loader.load_data(branch=branch)\n",
        "\n",
        "if not docs:\n",
        "    raise ValueError(\"No documents were loaded. Check your GitHub repo and filters.\")\n",
        "\n",
        "print(f\"Loaded {len(docs)} documents from GitHub.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HpMdEfNwa4gD"
      },
      "source": [
        "## Defining RAG Completion Function\n",
        "\n",
        "This function initializes the Nebius LLM and Embedding models, creates an index, and retrieves relevant information from the loaded GitHub repository.\n",
        "\n",
        "Runs retrieval-augmented generation (RAG) on GitHub repository data.\n",
        "    \n",
        "Parameters:\n",
        "  - query_text (str): The user query for retrieving information.\n",
        "  - embedding_model (str): The embedding model to use.\n",
        "  - generative_model (str): The generative model to use.\n",
        "    \n",
        "Returns:\n",
        "  - str: The generated response."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XD_aspw8WfA5"
      },
      "outputs": [],
      "source": [
        "def completion_to_prompt(completion: str) -> str:\n",
        "  return f\"<s>[INST] {completion} [/INST] </s>\\n\"\n",
        "\n",
        "\n",
        "def run_rag_completion(\n",
        "    query_text: str,\n",
        "    embedding_model: str =\"BAAI/bge-en-icl\",\n",
        "    generative_model: str =\"deepseek-ai/DeepSeek-V3\"\n",
        "    ) -> str:\n",
        "\n",
        "    llm = NebiusLLM(\n",
        "    model=generative_model,\n",
        "    api_key=os.getenv(\"NEBIUS_API_KEY\")\n",
        "    )\n",
        "\n",
        "    embed_model = NebiusEmbedding(\n",
        "        model_name=embedding_model,\n",
        "        api_key=os.getenv(\"NEBIUS_API_KEY\")\n",
        "    )\n",
        "    Settings.llm = llm\n",
        "    Settings.embed_model = embed_model\n",
        "\n",
        "    index = VectorStoreIndex.from_documents(docs)\n",
        "\n",
        "    query_engine = index.as_query_engine(similarity_top_k=5,streaming=True)\n",
        "\n",
        "    qa_prompt_tmpl_str = (\n",
        "            \"Context information is below.\\n\"\n",
        "            \"---------------------\\n\"\n",
        "            \"{context_str}\\n\"\n",
        "            \"---------------------\\n\"\n",
        "            \"Given the context information above I want you to think step by step to answer the query in a crisp manner, incase case you don't know the answer say 'I don't know!'.\\n\"\n",
        "            \"Query: {query_str}\\n\"\n",
        "            \"Answer: \"\n",
        "            )\n",
        "\n",
        "    qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)\n",
        "    query_engine.update_prompts({\"response_synthesizer:text_qa_template\": qa_prompt_tmpl})\n",
        "\n",
        "    response = query_engine.query(query_text)\n",
        "    return str(response)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "p8RJCXodbJOn"
      },
      "source": [
        "## Running a Query on the GitHub Repository\n",
        "\n",
        "We will now ask a question about the repository's purpose."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "wzga3QICmMI4",
        "outputId": "91c6d3b4-7103-4ca0-9b43-5760d6fd16f4"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "This repository is about an AI-powered logo generator. It uses advanced algorithms to create unique logo designs tailored to user preferences. Users can customize fonts, colors, and layouts, and they gain full commercial rights to the logos they download. The project includes a user-friendly interface, supports Docker deployment, and is built with technologies like Next.js, Nebius AI, and Drizzle ORM.\n"
          ]
        }
      ],
      "source": [
        "res = run_rag_completion('What is this repository about?')\n",
        "\n",
        "print(res)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
